Untitled

Load Libraries

library(jsonlite) |> suppressPackageStartupMessages()
library(tidyverse) |> suppressPackageStartupMessages()
library(maps) |> suppressPackageStartupMessages()
library(sf) |> suppressPackageStartupMessages()
library(readxl) |> suppressPackageStartupMessages()
library(spatstat) |> suppressPackageStartupMessages()
library(mapview) |> suppressPackageStartupMessages()
library(spdep) |> suppressPackageStartupMessages()
library(spatialreg) |> suppressPackageStartupMessages()
library(patchwork) |> suppressPackageStartupMessages()
set.seed(20001015)

cb_palette <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

1.Introduction

I am interested in studying the geographical distribution of martyrs’ memorial facilities in China. According to the Regulations on the Protection and Management of Martyrs’ Memorial Facilities, these facilities include martyrs’ cemeteries, martyrs’ graves, martyrs’ columbariums, walls of martyrs’ names, memorial halls, monuments, memorial pavilions, memorial towers, memorial shrines, memorial statues, and memorial squares (none is more familiar with those facilities than a DC citizen). These facilities can be seen as remnants of the Communist Revolution in China during the early 20th century. Their distribution reflects various aspects, such as the number of individuals from a region who sacrificed their lives for the Communist Revolution, the significance of these locations in the revolution (akin to Lexington’s role in the American Revolutionary War), former revolutionary base areas and battle sites, or perhaps simply the local government’s interest in ideological promotion (e.g., if a city has little connection to the Communist Revolution but still hosts a notable number of such facilities).

The data I am using was found in an Excel document on a Chinese econometrics web forum. However, I have not been able to identify the author or any related publication using this dataset. The dataset includes 3,966 observations and appears sufficiently exclusive, even if it does not cover all those facilities. Additionally, I conducted a sampling-based search and cross-validation for some of the facilities, confirming that the dataset provides accurate facility name, latitude and longitude information, except for approximately 100 facilities whose coordinates were incorrectly placed outside China’s borders. Despite its imperfections, this dataset offers a solid starting point for exploring the geographical distribution of martyrs’ memorial facilities and the spatial characteristics of the Chinese Communist Revolution.

2.Visualize Martyrs’ Memorial Facilities

2.1. Loading China’s Map

Although rnaturalearth can provide China map, it’s not accurate enough, especially for city level. Therefore, instead, I choose to use a China map I found online.

Reading layer `100000_full_city' from data source 
  `https://geo.datav.aliyun.com/areas_v3/bound/100000_full_city.json' 
  using driver `GeoJSON'
Simple feature collection with 477 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 73.50647 ymin: 3.397162 xmax: 135.0946 ymax: 53.56362
Geodetic CRS:  WGS 84

However, this approach also has its challenges. First, the four directly administered municipalities—Beijing, Shanghai, Tianjin, and Chongqing—are subdivided into district-level units (e.g., Beijing is divided into 16 “cities”). However, most statistical data is not available at the district level but treats these four municipalities as single cities. Therefore, the polygons of these four municipalities must first be merged.

###Beijing
beijing <- china %>%
  slice(1:16) %>%                       
  mutate(geometry = st_make_valid(geometry)) %>%  
  summarise(geometry = st_union(geometry)) %>%    
  mutate(name = "北京市") %>%                
  select(name,geometry)

china <- china %>%
  slice(-1:-16) %>%         
  bind_rows(beijing)     

###Tianjin
tianjin <- china %>%
  slice(1:16) %>%                   
  mutate(geometry = st_make_valid(geometry)) %>%  
  summarise(geometry = st_union(geometry)) %>%   
  mutate(name = "天津市") %>%          
  select(name,geometry)

china <- china %>%
  slice(-1:-16) %>%         
  bind_rows(tianjin)       

###Shanghai
shanghai <- china %>%
  slice(71:86) %>%                   
  mutate(geometry = st_make_valid(geometry)) %>%  
  summarise(geometry = st_union(geometry)) %>%    
  mutate(name = "上海市") %>%           
  select(name,geometry)

china <- china %>%
  slice(-71:-86) %>%        
  bind_rows(shanghai)       

###Chongqing
chongqing <- china %>%
  slice(250:287) %>%                     
  mutate(geometry = st_make_valid(geometry)) %>%  
  summarise(geometry = st_union(geometry)) %>%    
  mutate(name = "重庆市") %>%            
  select(name,geometry)

china <- china %>%
  slice(-250:-287) %>%         
  bind_rows(chongqing)   

Another issue is that our analysis does not include Taiwan, Hong Kong, and Macau, as they were not impacted directly by the Chinese Communist Revolution. Therefore, the next step is to remove the polygons for these regions.

china <- china %>% slice(c(-364:-391)) 

china <- china %>% slice(-233) 

china_sf <- sf::st_transform(china, crs = 3857) %>% rename(city = name)

china_map <- mapview(china_sf, legend = FALSE)
china_map

2.2. Load Martyrs’ Memorial Facilities

After loading the data, I conducted an initial visualization of the distribution of martyrs’ memorial facilities in China. Due to the large sample size, I randomly selected 500 points and used mapview to display their distribution on a map.

mf <- read_excel("data/Martyrs’ Memorial Facilities.xlsx")

mf_sf <- st_as_sf(mf, coords = c("longitude", "latitude"), crs = 4326) %>% 
  sf::st_transform(3857)

sample_mf_sf <- slice_sample(mf_sf, n = 500)

mapview(sample_mf_sf) + china_map

2.2. Count Martyrs’ Memorial Facilities in Each City

Here, I used st_join to calculate the number of martyrs’ memorial facilities contained within each city.

china_sf <- st_make_valid(china_sf)

mf_sf <- mf_sf %>% select(-city)

joined_sf <- st_join(mf_sf, china_sf, join = st_within)

point_counts <- joined_sf %>%
  group_by(city) %>%
  summarize(facilities = n())

point_counts <- point_counts %>% st_drop_geometry()

china_sf <- china_sf %>%
  left_join(point_counts, by = "city")

NA should mean no facilities. Therefoere, replace na with 0.

china_sf$facilities[is.na(china_sf$facilities)] <- 0

Then, we can visualize the city-level distribution of martyrs’ memorial facilities.

china_sf |> ggplot(aes(fill=facilities)) +
  geom_sf() +
  scale_fill_viridis_c(option="C") +
  theme_classic()

3. Clan Network and China’s Communist Revolution

Clan networks were a crucial soft organizational structure in traditional Chinese society. In the traditional Chinese governance system, the central government’s administrative control extended only to the county level, leaving governance below that level largely reliant on local clan networks, especially the management of prominent gentry. Clan networks took on many functions that modern governments fulfill today, particularly in the provision of public goods. For instance, they mediated disputes, administered justice (providing “fairness”), and facilitated education. Many traditional educational institutions, such as private academies (sishu), were often funded jointly by clan members and used clan-owned properties as venues.

However, in the traditional narratives of the Chinese Communist Party (CCP), especially before the reform and opening-up era, clans were seen as part of the feudal superstructure and thus symbols of conservatism, backwardness, and autocracy. Clan network leaders were often landlords, who were depicted as the primary “counter-revolutionaries” during the Communist Revolution and the targets of land reform and subsequent political campaigns (e.g., landlords were part of the “Five Black Categories”). This negative portrayal stemmed partly from Marxist historical materialism and partly from the practical needs of revolutionary struggle. In CCP narratives, landlord groups were strong supporters of the Chiang Kai-shek regime and, thus, unforgivable class enemies in the battle for power.

This narrative, however, is largely a construction, as local landlords were not closely tied to the Nationalist Party (KMT). In fact, KMT failed to establish solid control at village level. The negative propaganda against landlords was primarily driven by the political mobilization needs of the revolution. Before the Civil War, the CCP faced the challenge of rallying public support to overthrow the KMT government, which still enjoyed high prestige and legitimacy following the War of Resistance against Japan. China had emerged as a victor in World War II and a permanent member of the United Nations Security Council, marking significant advancements in its international standing. To overcome this legitimacy, the CCP launched a series of land reform campaigns in its controlled areas, inciting peasants to denounce landlords and redistributing their land and assets to the people.

The CCP also sought to link hatred of landlords with hostility toward the Chiang regime. One infamous practice was the Wang Jiang Gan (“looking at Chiang pole”). This involved tying landlords to a pole and gradually hoisting them up while asking if they could “see Chiang Kai-shek.” If the landlord answered no, they were hoisted higher; if they answered yes, they will die from dropping. Regardless of the response, the ultimate fate was often death by dropping.

Despite the CCP’s animosity toward landlords and the traditional clan networks they represented, it is surprising to find that many Communist leaders came from landlord families. Mao Zedong himself was born into a landlord family. This might be because early Communist revolutionaries were intellectuals, and literacy and access to higher education were crucial for exposure to and understanding of Communist ideology. Clan networks traditionally emphasized education, and the relationship between clan support and success in the imperial examination system (keju) has been well-documented. Clans often funded public education (sishu) and, in the early 20th century, could financially support young members pursuing higher or even Western education.

Thus, I plan to study the impact of traditional Chinese clan networks on the Communist Revolution. To operationalize this research, I will use the number of genealogies per capita from the Qing Dynasty and earlier to reflect the historical strength of clan networks, as genealogies are a significant measure of clan culture intensity. Additionally, I will use the number of martyrs’ memorial facilities per capita to measure revolutionary culture.

3.1. Load Genealogy Data

Unfortunately, I also found the genealogical data for each prefecture-level city on the same forum, so I have not been able to identify its author either. However, in subsequent visualization results, we can see that areas with high genealogical density align well with common knowledge about regions with strong clan culture. Jiangsu and Zhejiang (the eastern regions of China near Shanghai) have the highest number of genealogies. Therefore, I have chosen to trust the quality of this dataset. The dataset includes the number of genealogies for each city from the Song, Yuan, Ming, and Qing dynasties, as well as the population in 1990.

gen <- read_excel("data/Genealogy.xlsx")

gen <- gen %>%
  mutate(across(c(Song, Yuan, Ming, Qing), ~replace(., is.na(.), 0))) %>% 
  mutate(genealogies = Song + Yuan + Ming + Qing) %>% 
  select(-Song, -Yuan, -Ming, -Qing)

Here, I joined the new dataset with the main martyrs’ memorial facilities dataset, china_sf.

china_sf <- china_sf %>%
  left_join(gen, by = "city") 
china_sf <- china_sf %>% select(city, province, pop, facilities, genealogies, geometry)

As mentioned above, the map shows that Jiangsu and Zhejiang have the strongest clan culture intensity, which aligns with our common knowledge.

china_sf |> ggplot(aes(fill=genealogies)) +
  geom_sf() +
  scale_fill_viridis_c(option="C") +
  theme_classic()

For the missing values in the genealogical data, we can observe from the map that these are concentrated in non-Han areas in western China. Since clan networks are a phenomenon specific to Han culture, particularly prevalent in southeastern China, it is reasonable to fill the missing values with 0.

china_sf$genealogies[is.na(china_sf$genealogies)] <- 0

3.2. Population Data Manipulation

Since I plan to “standardize” the independent and dependent variables using per capita values, population is also an important variable. Therefore, I first visualized the population distribution in China in 1990.

china_sf |> ggplot(aes(fill=pop)) +
  geom_sf() +
  scale_fill_viridis_c(option="C") +
  theme_classic()

As seen in the figure, there are many missing values in the population data, and it is not feasible to fill them with 0. Therefore, I plan to fill the missing values with the average population of the five nearest neighbors.

china_sf$centroid <- st_centroid(china_sf$geometry)

china_nb <- knn2nb(knearneigh(china_sf$centroid, k = 5))

china_listw <- nb2listw(china_nb, style = "W")

china_sf$pop_mean_neighbors <- sapply(1:nrow(china_sf), function(i) {
  neighbors <- china_nb[[i]]
  neighbor_pops <- china_sf$pop[neighbors]
  mean_pop <- mean(neighbor_pops, na.rm = TRUE)
  return(mean_pop)
})

china_sf$pop <- ifelse(is.na(china_sf$pop), china_sf$pop_mean_neighbors, china_sf$pop)

china_sf |> ggplot(aes(fill=pop)) +
  geom_sf() +
  scale_fill_viridis_c(option="C") +
  theme_classic()

As shown in the figure, all the missing population values have now been filled, and the filled areas appear very smooth without producing outliers that differ significantly from the surrounding cities.

Now, here is no missing values in china_sf

china_sf <- china_sf %>% select(-centroid, -pop_mean_neighbors)

Then, we can use population to “standardize” two major variables.

3.3. Preliminary Regression

However, in the subsequent regression, I attempted to use the log form of the two variables, but taking the log of 0 would result in a large number of missing values. Therefore, I first added 1 to both main variables. This approach minimizes estimation bias while significantly preserving the sample size.

china_sf <- china_sf %>% 
  mutate(facap = 100*(1+facilities)/pop) %>% 
  mutate(gecap = 100*(1+genealogies)/pop)

Here, I attempt to gain an initial intuition about the relationship between the two variables through a scatter plot.

ggplot(china_sf, aes(x = gecap, y = facap)) +
  geom_point() +  
  labs(x = "Genealogies", y = "Facilities", title = "Scatter plot of Facilities vs Genealogies") +
  theme_minimal() 

This plot looks strange due to the presence of a severe outlier. Therefore, the next step is to identify and remove this outlier. The method I chose is to replace the outlier’s two variables with their respective medians.

facap_above_3000 <- china_sf %>% filter(facap > 3000)

facap_above_3000
Simple feature collection with 1 feature and 7 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 8726952 ymin: 3462402 xmax: 9610049 ymax: 4281466
Projected CRS: WGS 84 / Pseudo-Mercator
      city   province      pop facilities genealogies
1 阿里地区 西藏自治区 0.000613          7           0
                        geometry   facap    gecap
1 POLYGON ((9581332 4281466, ... 1305057 163132.1
china_sf <- china_sf %>%
  mutate(
    facap = ifelse(facap > 3000, median(facap), facap),
    gecap = ifelse(gecap > 3000, median(gecap), gecap)
  )
ggplot(china_sf, aes(x = gecap, y = facap)) +
  geom_point() +  
  labs(x = "Genealogies", y = "Facilities", title = "Scatter plot of Facilities vs Genealogies") +
  theme_minimal() 

Now, we can finally observe the relationship between the two variables. However, there does not seem to be a clear linear relationship between them. Therefore, I plan to take the log of both variables to examine any potential non-linear relationship between them.

china_sf <- china_sf %>% 
  mutate(lfacap = log(facap)) %>% 
  mutate(lgecap = log(gecap))
ggplot(china_sf, aes(x = lgecap, y = lfacap)) +
  geom_point() +  
  geom_smooth(method = "lm", formula = y ~ poly(x, 2), color = "blue", se = FALSE) +  
  labs(x = "Genealogies", y = "Facilities", title = "Scatter plot of Facilities vs Genealogies") +
  theme_minimal()

It is clear that there is a significant inverted U-shaped relationship between the two variables. This is actually not surprising, as I previously discussed the relationship between clan culture and education, and the importance of education in accessing communist ideology. Therefore, when clan culture is weak, increasing clan culture can improve educational opportunities, which in turn increases the likelihood of individuals being exposed to communist ideas, leading to a positive correlation between the two variables. However, when clan power is strong, extreme ideologies like communism are suppressed, especially because clan culture often carries strong conservative tendencies.

Therefore, I hypothesize: H1: The clan culture has a negative relationship with red culture/China’ communist revolution.

china_sf <- china_sf %>% 
  mutate(lgecap2 = lgecap^2)

Next, to test the significance of this inverted U-shaped relationship, I performed a simple regression that includes the squared term of the independent variable.

ols_fit <- lm(lfacap ~ lgecap + lgecap2, data=china_sf)

result_df <- broom::tidy(ols_fit)

result_df$p.value <- formatC(result_df$p.value, format = "f", digits = 3)

result_df
# A tibble: 3 × 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl> <chr>  
1 (Intercept)   1.30      0.0606     21.5  0.000  
2 lgecap        0.266     0.0611      4.35 0.000  
3 lgecap2      -0.0559    0.0143     -3.90 0.000  

The results show that both the independent variable and its squared term are significant at the 0.001 level. Therefore, H1 is supported.

Here, I visualized the two logged variables, so we can see they relationship graphically.

china_lgecap <- china_sf |> ggplot(aes(fill=lgecap)) +
  geom_sf() +
  scale_fill_viridis_c(option="C") +
  theme_classic()

china_lfacap <- china_sf |> ggplot(aes(fill=lfacap)) +
  geom_sf() +
  scale_fill_viridis_c(option="C") +
  theme_classic()

china_lgecap+china_lfacap

Besides, to look into the relationship between two variables more closely, I try to zoom in southeast China, where is the origon of China’s communist revolution.

china_sf$province[china_sf$city == "襄阳市"] <- "湖北省"

southeast_china_sf <- china_sf %>% filter(province %in% c("江西省", "福建省", "广东省", "安徽省", "浙江省", "江苏省", "湖南省", "湖北省", "上海市"))
southeast_lgecap <- southeast_china_sf |> ggplot(aes(fill=lgecap)) +
  geom_sf() +
  scale_fill_viridis_c(option="C") +
  theme_classic()

southeast_lfacap <- southeast_china_sf |> ggplot(aes(fill=lfacap)) +
  geom_sf() +
  scale_fill_viridis_c(option="C") +
  theme_classic()

southeast_lgecap+southeast_lfacap

It’s clear that clan networks are well-developed throughout southeastern China, especially in the Zhejiang region in the eastern part. On the other hand, martyr memorial facilities are highly concentrated in Jiangxi Province (the central part of the map), while their density is much lower in the western province of Hunan and the northeastern provinces of Jiangsu and Zhejiang. Interestingly, there is a distinct contrast between Jiangxi and Hunan, a phenomenon that is further discussed below. But overall, we can see a negative relationship between clan culture and red culture in Jiangsu and Zhejiang, where exist the strongest clan culture, supporting H1 further.

3.4. Spatial Regression

china_sf$resid <- ols_fit$residuals
china_sf |> ggplot(aes(x=lgecap, y=resid)) +
  geom_point() +
  geom_hline(yintercept=0, color=cb_palette[1], linewidth=1) +
  theme_classic()

However, the visualization results also raise concerns about spatial autocorrelation. We found that both clan culture and revolutionary culture exhibit strong clustering. Therefore, it is necessary to test whether spatial autocorrelation exists in the regression above.

i_resid <- moran.test(china_sf$resid, china_listw)

i_resid

    Moran I test under randomisation

data:  china_sf$resid  
weights: china_listw    

Moran I statistic standard deviate = 12.164, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     0.3763306541     -0.0027397260      0.0009712037 

Moran’s I of residual is 0.376, which is not too strong, but still need to be handled. I run a spatial regression to reduce the bias caused by spatial autocorrelation.

ldv_fit <- lagsarlm(lfacap ~ lgecap + lgecap2, data=china_sf, listw=china_listw)

ldv_result_df <- broom::tidy(ldv_fit)

ldv_result_df$p.value <- formatC(ldv_result_df$p.value, format = "f", digits = 3)

ldv_result_df
# A tibble: 4 × 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl> <chr>  
1 rho           0.584     0.0532     11.0  0.000  
2 (Intercept)   0.522     0.0890      5.87 0.000  
3 lgecap        0.173     0.0518      3.35 0.001  
4 lgecap2      -0.0309    0.0121     -2.56 0.010  

The independent variable and its squared term remain significant, but their significance has decreased because part of the data’s information is used to estimate the impact of spatial autocorrelation. Meanwhile, rho is significant at the 0.001 level, indicating that spatial autocorrelation does indeed exist.

china_sf$sp_resid <- ldv_fit$resid
china_sf |> ggplot(aes(x=lgecap, y=sp_resid)) +
  geom_point() +
  geom_hline(yintercept=0, color=cb_palette[1], linewidth=1) +
  theme_classic()

i_resid <- moran.test(china_sf$sp_resid, china_listw)

i_resid

    Moran I test under randomisation

data:  china_sf$sp_resid  
weights: china_listw    

Moran I statistic standard deviate = -0.59839, p-value = 0.7252
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
    -0.0213668594     -0.0027397260      0.0009689886 

After controlling for spatial autocorrelation, the Moran’s I of the residuals decreased to -0.02, which is very close to 0, indicating that spatial autocorrelation has been effectively addressed.

3.5. Equilibrium Effects

Given the spatial regression coefficient \(\rho\), and the non-spatial regression coefficient vector \(\boldsymbol\beta = (\beta_0, \beta_1, \beta_2)\), the long-run equilibrium value for \(Y\) is given by

\[ Y^* = (I - \rho\mathbf{W})^{-1}(X\boldsymbol\beta), \]

Using the formula above, we can calculate what happens to the number of martyr memorial facilities in Ganzhou, the capital of the Chinese Soviet Republic, and its surrounding areas when the clan network power in the region doubles. To focus this study on the Chinese Communist Revolution, I limit the research scope to southeastern China. This is because many memorial facilities in the north commemorate martyrs from the War of Resistance Against Japan, which are less directly related to the Communist Revolution. Southeastern China, however, was the birthplace of the early armed revolution of the Chinese Communist Party.

china_sf <- china_sf %>%
  mutate(southeast = if_else(province %in% c("江西省", "福建省", "广东省", "安徽省", "浙江省", "江苏省", "湖南省", "湖北省", "上海市"), 1, 0))
target_city <- "赣州市"
# Create a dummy variable in it_sf which is 1 for this target collegio
china_sf <- china_sf |> mutate(
  is_target = factor(city == target_city)
)
# Use sf to create a single geometry which is just a circle around the centroid
# of the target collegio. Since 50km was our cutoff for being considered a
# neighbor, we use 50km here for the radius of the circle
circ_radius <- 200000
circ_sf <- china_sf |> filter(is_target == TRUE) |> sf::st_centroid() |>
  sf::st_buffer(circ_radius) |> sf::st_boundary()
Warning: st_centroid assumes attributes are constant over geometries
china_sf %>%  filter(southeast  == 1) %>% ggplot() +
  geom_sf(aes(fill=is_target)) +
  geom_sf(data=circ_sf, linetype="dashed") +
  scale_fill_manual(values=c(cb_palette[1], cb_palette[2])) +
  theme_classic()

This diagram is a conceptual illustration, showing the target city where the independent variable is doubled in our “laboratory” and the potential spillover effects of this change on surrounding cities.

which(china_sf$is_target == TRUE)
[1] 126

Here, I created a counterfactual variable for the number of genealogies per capita. The only difference between this counterfactual variable and the original variable is that the value in row 126, corresponding to Ganzhou City, is doubled from the original value.

X_obs <- cbind(1, china_sf$lgecap, china_sf$lgecap2)

china_sf$gecap_cf <- china_sf$gecap
china_sf$gecap_cf[126] <- china_sf$gecap_cf[126] * 2

X_cf <- cbind(1, log(china_sf$gecap_cf), log(china_sf$gecap_cf)^2) 

which(apply(X_obs != X_cf, 1, any))
[1] 126

Here, we can use the formula to calculate the difference between the equilibrium values of the dependent variable derived from the original independent variable and the counterfactual variable.

compute_y_eq <- function(X, beta, rho, nb_obj) {
  # Create an identity matrix
  I <- diag(nrow(X))
  W <- nb2mat(nb_obj, style="W")
  y_eq <- solve(I - rho * W) %*% (X %*% beta)
  return(y_eq)
}

estimates <- ldv_result_df$estimate
beta <- c(estimates[2], estimates[3], estimates[4])
rho <- estimates[1]

lfacap_eq <- compute_y_eq(X_obs, beta, rho, china_nb)
lfacap_cf <- compute_y_eq(X_cf, beta, rho, china_nb)

china_sf$lfacap_diff <- as.vector(lfacap_cf - lfacap_eq)

We can see that the differences are less than 0. Therefore, I visualized the regions where the size of the change in the dependent variable is greater than 0.001.

china_sf |> filter(southeast == 1) |>
    mutate(lfacap_diff = ifelse(lfacap_diff > -0.001, NA, lfacap_diff)) |>
    ggplot(aes(fill=lfacap_diff)) +
    geom_sf() +
    scale_fill_viridis_c(option="C") +
    theme_classic()

As shown in the figure, doubling clan culture leads to a decrease in the logged number of per capita martyr memorial facilities. Moreover, this negative effect exhibits a spillover effect that gradually diminishes in surrounding cities.

4. The Distribution of Martyrs’ Memorial Facilities

4.1. Visualize The Density of Martyrs’ Memorial Facilities

# generate the window
southeast_china_sfc <- southeast_china_sf %>% 
  select(geometry) %>% 
  st_buffer(300) %>%
  st_union()

mapview(southeast_china_sfc)
southeast_mf_sf <- mf_sf[st_within(mf_sf, southeast_china_sfc, sparse = FALSE), ]

mf_ppp <- as.ppp(sf::st_as_sfc(southeast_mf_sf), W=as.owin(southeast_china_sfc))
Warning: data contain duplicated points
mf_density <- density(mf_ppp, sigma=50000)

plot(mf_density)

4.2. Visualize The Population Density

To depict the population density, we first need to sample point data based on the proportion of each city’s population relative to the entire region. Here, I choose to sample 3000 points.

nsplit = function(X,n){
  p = X/sum(X)
  diff(round(n*cumsum(c(0,p))))
}

southeast_china_sf$sample <- nsplit(southeast_china_sf$pop, 3000) 

southeast_china_sf <- southeast_china_sf %>%
  mutate(sample = if_else(sample == 0, 1, sample))

sampled_points <- st_sample(southeast_china_sf, size = southeast_china_sf$sample, type = 'random')

ggplot() +
  geom_sf(data = southeast_china_sf, fill = "lightblue", color = "black", alpha = 0.5) +  
  geom_sf(data = sampled_points, color = "red", size = 0.5) +  
  theme_minimal() +
  labs(title = "Randomly Sampled Points by Population Weight")

Now, we can construct the population intensity function.

pop_ppp <- as.ppp(sampled_points, W=as.owin(southeast_china_sfc))
pop_density <- density(pop_ppp)

plot(pop_density)

4.3. Revolution and Population

Theoretically, communist revolutions are more likely to develop in areas with low population density, as such regions are often less suitable for human settlement due to rugged terrain. However, rugged terrain can benefit revolutionaries by providing hiding spots and opportunities to use the landscape to conduct guerrilla warfare against government forces, especially when they are outnumbered and outgunned. Therefore, I propose the following hypothesis:

H2: Areas with lower population density will exhibit a higher density of martyrs’ memorial facilities.

num_regions <- 3
region_labels <- c("Low", "Medium", "High")
pop_vals <- pop_density
pop_quant <- quantile(pop_vals, probs=(0:num_regions)/num_regions, na.rm=TRUE)
pop_cut <- cut(pop_vals, breaks=pop_quant, labels=region_labels)
pop_areas <- tess(image=pop_cut)
plot(pop_areas)

obs_mf_counts <- quadratcount(mf_ppp, tess=pop_areas) |> as.vector()
names(obs_mf_counts) <- region_labels
obs_mf_counts
   Low Medium   High 
   533    519    502 

After dividing the region into high, medium, and low population density areas, the next step involves using Monte Carlo Simulation to simulate the distribution of martyrs’ memorial facilities entirely based on population density 999 times. I then plotted the probability distribution of the number of martyrs’ memorial facilities in low-population-density areas from the simulations and compared it to the actual observed value.

compute_quadrat_counts <- function(sim_ppp) {
  sim_counts <- quadratcount(sim_ppp, tess=pop_areas) |> as.vector()
  names(sim_counts) <- region_labels
  return(sim_counts)
}

gen_sims_ppp <- function(num_sims) {
  prot_sims <- spatstat.random::rpoint(
    n = nrow(southeast_mf_sf),
    f = pop_density,
    nsim = num_sims
  )
  return(prot_sims)
}

full_sims_list <- gen_sims_ppp(num_sims = 999)
full_sim_area_counts <- lapply(X=full_sims_list, FUN=compute_quadrat_counts)
full_count_df <- as_tibble(full_sim_area_counts) |> t() |> as_tibble()
Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if
`.name_repair` is omitted as of tibble 2.0.0.
ℹ Using compatibility `.name_repair`.
colnames(full_count_df) <- region_labels
full_count_df |> head()
# A tibble: 6 × 3
    Low Medium  High
  <int>  <int> <int>
1   398    472   684
2   428    443   683
3   360    483   711
4   375    495   684
5   401    493   660
6   406    505   643
mc_df <- bind_rows(full_count_df, obs_mf_counts)
full_count_df |> ggplot(aes(x=Low)) +
  geom_density(fill=cb_palette[2], alpha=0.5) +
  geom_vline(xintercept = obs_mf_counts["Low"], linetype="dashed", color=cb_palette[1]) +
  theme_classic()

As shown in the figure, the actual number of martyrs’ memorial facilities in low-population-density areas is 533, while the simulated values are concentrated around 390. Therefore, if the null hypothesis holds—i.e., martyrs’ memorial facilities are distributed according to population density—the observed value is extremely unlikely to be observed, as it lies at the far right tail of the probability distribution. This indicates that martyrs’ memorial facilities are significantly concentrated in low-population-density areas. H2 is supported.

q4_more_extreme_df <- mc_df[mc_df$Low >= obs_mf_counts["Low"],]
q4_prop_more_extreme <- nrow(q4_more_extreme_df) / nrow(mc_df)
q4_prop_more_extreme
[1] 0.001

Similarly, we can calculate the proportion of values as extreme as or more extreme than the observed value relative to the total sample (1000). The result is 0.001, indicating that in 999 simulations, no values as extreme as or more extreme than the observed value were found. This provides stronger support for H2.

4.4. Distance to Capital and Revolution

The distance from the capital reflects the cost for the government to effectively exercise administrative control and suppress the revolution in that area. The farther a location is from the capital, the higher the cost for the government to suppress the revolution. On the other hand, the closer a location is to the capital, the harder it is for revolutionary parties to spread their ideology, develop party members, organize, and survive. Therefore, I use the distance from the Mansion of the President in Nanjing to measure the distance from the capital and propose the hypothesis:

H3: Martyrs’ memorial facilities tend to be distributed in areas farther from the capital.

mp_coords <- c(32.04484, 118.79236)
mp_df <- tibble(name=c("Mansion of the President of Nanjing"), lat=mp_coords[1], lon=mp_coords[2])
mp_sf <- sf::st_as_sf(
  mp_df,
  coords=c("lon", "lat"),
  crs=4326
) |> sf::st_transform(3857)
mapview(mp_sf)

Next, I will compare the actual average distance from the capital to the martyrs’ memorial facilities with the average distance from the capital obtained from the previous Monte Carlo simulation results in the same way.

all_distances <- southeast_mf_sf |> sf::st_distance(mp_sf)
obs_mean_dist <- as.numeric(mean(all_distances))
obs_mean_dist
[1] 763375.8

The actual average distance from the capital to the martyrs’ memorial facilities is 763375.8 m.

compute_mean_dist <- function(sim_ppp) {
  sim_prot_sf <- sim_ppp |> sf::st_as_sf() |> sf::st_set_crs(3857) |> filter(label == "point")
  sim_dists <- sim_prot_sf |> sf::st_distance(mp_sf)
  return(mean(as.numeric(sim_dists)))
}

sim_dists <- lapply(X=full_sims_list, FUN=compute_mean_dist)
sim_dists |> head()
$`Simulation 1`
[1] 648980.6

$`Simulation 2`
[1] 638051.6

$`Simulation 3`
[1] 646935.5

$`Simulation 4`
[1] 662433.3

$`Simulation 5`
[1] 654923.4

$`Simulation 6`
[1] 672673.5
sim_dist_df <- unlist(sim_dists) |> as_tibble()
obs_dist_df <- tibble(value = obs_mean_dist)
mc_dist_df <- bind_rows(sim_dist_df, obs_dist_df)
mc_dist_df |> ggplot(aes(x=value)) +
  geom_density(fill=cb_palette[2], alpha=0.5) +
  geom_vline(xintercept = obs_mean_dist, linetype="dashed", color=cb_palette[1]) +
  theme_classic()

q5_more_extreme_df <- mc_dist_df[mc_dist_df$value >= obs_mean_dist,]
q5_more_extreme_prop <- nrow(q5_more_extreme_df) / nrow(mc_dist_df)
q5_more_extreme_prop
[1] 0.001

Both graph and p-value support H3, martyrs’ memorial facilities tend to be distributed in areas farther from the capital.

4.5. Province Boundary and Revolutionary Strategy

Jiangxi Province served as the base of the Chinese Soviet Republic and the cradle of the CCP’s armed revolution. Observing the boundaries of Jiangxi Province, I discovered an intriguing phenomenon. Along the border between Jiangxi and Hunan Province (to the west of Jiangxi), the number of martyrs’ memorial facilities on the Hunan side of the border is sparse and highly dispersed, particularly near the boundary. In contrast, on the Jiangxi side, the distribution of martyrs’ memorial facilities is much denser, especially near the border. This observation might provide insights into the strategic considerations of the Chinese Communist Revolution.

In Mao Zedong’s 1928 essay “Why is it that Red Political Power can Exist in China?” he elaborates on the survival strategy of Red political power:

“The long-term survival inside a country of one or more small areas under Red political power completely encircled by a White regime is a phenomenon that has never occurred anywhere else in the world. There are special reasons for this unusual phenomenon. It can exist and develop only under certain conditions. … For this unusual phenomenon can occur only in conjunction with another unusual phenomenon, namely, war within the White regime. It is a feature of semicolonial China that, since the first year of the Republic [1912] the various cliques of old and new warlords have waged incessant wars against one another, supported by imperialism from abroad and by the comprador and landlord classes at home. … The prolonged splits and wars within the White regime provide a condition for the emergence and persistence of one or more small Red areas under the leadership of the Communist Party amidst the encirclement of the White regime. The independent regime carved out on the borders of Hunan and Jiangxi Provinces is one of many such small areas.”

Therefore, I plan to examine the relationship between the distribution of martyrs’ memorial facilities and their distance from the Jiangxi Province border. To begin, I visualized the boundary of Jiangxi Province alongside the martyrs’ memorial facilities.

jiangxi_sf <- southeast_china_sf %>% filter(province == "江西省")
jiangxi_bd <- jiangxi_sf %>% st_union %>% st_boundary() %>% st_as_sf()
mapview(southeast_mf_sf) + mapview(jiangxi_bd)

Next, I calculated the distance of all martyrs’ memorial facilities from the Jiangxi Province border, recording distances within the province as negative values. Finally, I plotted the density distribution of these distances.

jiangxi_polygon <- st_cast(jiangxi_bd, "POLYGON")

in_sf <- southeast_mf_sf %>%
  filter(st_intersects(southeast_mf_sf, jiangxi_polygon, sparse = FALSE))
Warning: Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
ℹ Please use one dimensional logical vectors instead.
out_sf <- southeast_mf_sf %>%
  filter(!st_intersects(southeast_mf_sf, jiangxi_polygon, sparse = FALSE))
in_sf$distance <- in_sf %>% st_distance(jiangxi_bd) %>% as.vector
in_sf <- in_sf %>% mutate(distance = -distance)
out_sf$distance <- out_sf %>% st_distance(jiangxi_bd) %>% as.vector

distance = rbind(in_sf, out_sf)
ggplot(distance, aes(x = distance)) +
  geom_density(fill = cb_palette[2], alpha = 0.5) +
  labs(
    title = "Density Plot of Distance",
    x = "Distance",
    y = "Density"
  ) +
  geom_vline(xintercept = 0, linetype="dashed", color=cb_palette[1]) +
  theme_classic()

As shown in the figure, martyrs’ memorial facilities are densely concentrated near the inner side of Jiangxi Province’s border (the region to the left of x = 0). Within Jiangxi Province, density decreases the farther from the border. A similar pattern is observed outside Jiangxi, but around 200 km from the border, a new peak emerges, indicating the presence of other “red regimes” apart from the Central Soviet Area.

From this, we can draw two preliminary conclusions. First, China’s communist revolution can be characterized as a “border revolution,” with red regimes predominantly established at the intersection of political boundaries. Second, it is a “dispersed revolution,” where apart from the Central Soviet Area, other red regimes emerged at an “optimal” distance from Jiangxi’s border. This spacing minimized the risk of suppression while remaining close enough to facilitate communication and mutual support among the bases.

distance_filtered <- distance %>% filter(distance < abs(min(distance)))

ggplot(distance_filtered, aes(x = distance)) +
  geom_density(fill = cb_palette[2], alpha = 0.5) +
  labs(
    title = "Density Plot of Distance",
    x = "Distance",
    y = "Density"
  ) +
  geom_vline(xintercept = 0, linetype="dashed", color=cb_palette[1]) +
  theme_classic()

When zooming in on the observation window to ensure that the maximum distances inside and outside the provincial border are equal, the above patterns remain consistent.

Next, I plan to use st_buffer in increments of 5,000 meters to calculate the number of martyrs’ memorial facilities within various distance ranges inside and outside Jiangxi Province’s border (e.g., 0–5,000 meters, 5,000–10,000 meters, and so on). This approach will allow us to examine the distribution patterns of China’s red regimes more closely.

generate_buffer_counts <- function(times) {
  distance <- 5000 * times
  jiangxi_buffer <- jiangxi_bd %>% st_buffer(dist = distance)

  in_buffer <- sum(st_intersects(in_sf, jiangxi_buffer, sparse = FALSE))
  out_buffer <- sum(st_intersects(out_sf, jiangxi_buffer, sparse = FALSE))
  
  df <- data.frame(
    distance = c(-distance, distance),
    counts = c(in_buffer, out_buffer))
  
  return(df)
}
buffer_counts <- generate_buffer_counts(1)
buffer_counts
  distance counts
1    -5000     18
2     5000      5
generate_between_buffer_counts <- function(times) {
  big_distance <- 5000 * times
  small_distance <- 5000 * (times-1)
  big_buffer <- jiangxi_bd %>% st_buffer(dist = big_distance)
  small_buffer <- jiangxi_bd %>% st_buffer(dist = small_distance)
  
  buffer_difference <- st_difference(big_buffer, small_buffer)
  
  in_buffer <- sum(st_intersects(in_sf, buffer_difference, sparse = FALSE))
  out_buffer <- sum(st_intersects(out_sf, buffer_difference, sparse = FALSE))
  
  df <- data.frame(
    distance = c(-big_distance, big_distance),
    counts = c(in_buffer, out_buffer))
  
  return(df)
}
generate_between_buffer_counts(2)
  distance counts
1   -10000     50
2    10000      7
for (times in 2:10) {
  # Get the dataframe for this value of times
  temp_df <- generate_between_buffer_counts(times)
  
  # Combine it with the existing buffer_counts dataframe
  buffer_counts <-  rbind(buffer_counts, temp_df)
}

buffer_counts <<- buffer_counts

buffer_counts
   distance counts
1     -5000     18
2      5000      5
3    -10000     50
4     10000      7
5    -15000     27
6     15000     10
7    -20000     29
8     20000      9
9    -25000     15
10    25000      7
11   -30000     25
12    30000     11
13   -35000     18
14    35000     14
15   -40000     21
16    40000     10
17   -45000     17
18    45000      6
19   -50000     10
20    50000      9
ggplot(buffer_counts, aes(x = distance, y = counts)) +
  geom_line(color = cb_palette[2], size = 1) +
  geom_point(color = cb_palette[1], size = 2) +  # Add points to highlight values
  labs(
    title = "Line Plot of Counts by Distance",
    x = "Distance",
    y = "Counts"
  ) +
  theme_minimal() +
  geom_vline(xintercept = 0, linetype="dashed", color=cb_palette[1]) +
  theme(
    plot.title = element_text(hjust = 0.5)  # Center the title
  )
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

As shown in the figure, there are 18 facilities within 0–5,000 meters inside the provincial border, while the number reaches an astonishing 50 in the 5,000–10,000 meter range. However, outside the provincial border, the numbers are only 5 and 7 in these respective distance ranges. This stark contrast between inside and outside the border provides insights into the strategic considerations of the Chinese Communist revolution. The red regimes were primarily concentrated 5,000–10,000 meters inside the provincial border. This distance likely created a buffer zone from external provinces, protecting the regime from direct attacks by warlords outside Jiangxi Province. At the same time, the distance was short enough to allow the regime to retreat quickly to neighboring provinces when under attack from warlords within Jiangxi Province.

To some extent, the “free-rider problem” and “tragedy of the commons” among warlords in different provinces created a survival space for the Chinese Communist regime. While the optimal solution for the warlords would have been to unite and eliminate the red regime, once the Communists were driven out of their territory, warlords often stopped pursuing them further, as the conflict then shifted to another warlord’s domain. Moreover, warlords were generally reluctant to expend their own military resources or hoped neighboring warlords would take on the task of eliminating the Communist regime. This “free-rider” behavior ultimately resulted in no single warlord committing fully to suppression efforts, allowing the red regime to survive.